Why approximate Bayesian computational (ABC) methods 
cannot handle model choice problems 
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Abstract 

Approximate Bayesian computation (ABC), also known as likelihood- free methods, 
have become a favourite tool for the analysis of complex stochastic models, primarily 
in population genetics but also in financial analyses. We advocated in |Grelaud et al.| 
( 2009 1 the use of ABC for Bayesian model choice in the specific case of Gibbs random 



fields (GRF), relying on a sufficiency property mainly enjoyed by GRFs to show that 
the approach was legitimate. Despite having previously suggested the use of ABC for 



model choice in a wider range of models in the DIY ABC software (Cornuet et al 



2008), we present theoretical evidence that the general use of ABC for model choice is 



fraught with danger in the sense that no amount of computation, however large, can 
guarantee a proper approximation of the posterior probabilities of the models under 
comparison. 



Keywords: likelihood-free methods, Bayes factor, DIYABC, Bayesian model choice, 
sufficiency. 
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1 Introduction 

Inference on population genetic models such as coalescent trees is one representative ex- 
ample of cases when statistical analyses like Bayesian inference cannot operate because 
the likelihood function associated with the data is not completely known, i.e. cannot be 



computed in a manageable time (Tavare et al. , 1997, Beaumont et al. , 2002, Cornuet 



et al. , 2008). The fundamental reason for this impossibility is that the statistical model 



associated with coalescent data needs to integrate over trees of extreme complexity. 

In such settings, traditional approximation tools based on Monte Carlo simulation 



(Robert and Casella, 2004) from the Bayesian posterior distribution are unavailable for 



all practical purposes. Indeed, due to the complexity of the latent structures defining the 
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likelihood (such as the coalescent tree), simulation of those structures is too unstable to 
be trusted to bring a reliable approximation in a manageable time. Such complex models 
call for a practical if cruder approximation method, the ABC methodology being a serious 



contender, where ABC stands for approximate Bayesian computation. Tavare et al. (1997) 



and Pritchard et al. (1999) introduced ABC methods as a rejection technique bypassing 
the computation of the likelihood function via a simulation from the corresponding dis- 
tribution. For recent reviews on ABC, see Beaumont (2010) and Lopes and Beaumont 



( 2010 ). The wide and successful array of applications based on implementations of ABC in 



genomics and ecology is covered by Csillery et al. (2010a), while the number of publications 



relying on this technique runs in the hundreds. 



Pritchard et al. 



(1999) describe the use of model choice based on ABC for distin- 
guishing between different mutation models. The intuition behind the method is that 
the average ABC acceptance rate associated with a given model is proportional to the 
marginal likelihood corresponding to this approximative model, when identical summary 
statistics, distance, and tolerance level are used for all models. In practice, an estimate of 
the ratio of marginal likelihoods is given by the ratio of observed acceptance rates. Us- 
ing Bayes formula, estimates of the posterior probabilities are straightforward to derive. 



This approach has been widely used in the literature (see, e.g.. 


Estoup et al. 


2004 


Miller 


et al. 


2005 and 


Pascual et al. 


2007 


Sainudiin et al. , 2011 ). Note that Miller et al. 


(2005) 



is particularly influencial for the conclusion it derives from the ABC analysis: the focus 
of this Science paper is the European invasion of the western corn rootworm, which is 
North America's most destructive corn pest. Because this pest was initially introduced in 
Central Europe, it was believed that subsequent outbreaks in Western Europe originated 
from this area. Based on this ABC model choice analysis of the genetic variability of the 
rootworm, the authors conclude that this belief is false: There have been at least three 
independent introductions from North America during the past two decades. 

An improvement to the above estimate is due to Fagundes et al. (2007), thanks to a 
regression regularisation. In this approach, model indices are processed as categorical vari- 
ables in a formal multinomial (polychotomous) regression. For instance, when comparing 
two models, this leads to a standard logistic regression. Rejection-based approaches were 



lately introduced by Cornuet et al. (2008), Grelaud et al. (2009) and Toni et al. (2009), in 



a Monte Carlo perspective simulating model indices as well as model parameters. Those 
more recent extensions are already widely in use by the population genetics community, as 



exemphfied by Belle et al. (2008), 


Cornuet et al. (201C 


), Excoffier et al. (2009), Ghirotto 


et al. 


( 


2010), Guillemaud et al. ( 


2009), 


Leuenberger and Wegmann (2010), Patin et al. 


(2009 


), 


Ramakrishnan and Hadly 


(2009) 


Verdu et al. 


( 2009 ) , or Wegmann and Excoffier 


(2010 


)• 


Another illustration of the popularity of this approach is given by the availability 



of three three softwares implementing an ABC model choice methodology: 
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ABC-SysBicQ developped by the Theoretical Systems Biology Group at Imperial 
College London, which implements a SMC-based ABC for inference in system biol- 



ogy, including model-choice (Toni et al. , 2009). 



DIYABCQ developped by the Centre de Biologie et de Gestion des Populations, at 
INRA Montpellier, which implements a regularised ABC-MC algorithm on popula- 



tion history using molecular markers (Cornuet et al. , 2008). 



PopABCQ developped by the School of Biological Sciences at the University of 
Reading, which implements a regular ABC-MC algorithm for genealogical simulation 



(Lopes et al., 2009). 



Grelaud et al. (2009) process via ABC the specific case of Gibbs random fields with 
missing normalising constants. They establish that exact Bayesian model selection can 
be implemented in this setting, deriving this result from the property that the concate- 
nation of the sufficient statistics across models is also sufficient for model comparison. In 



a subsequent paper, Didelot et al. (2010) advocate the role of ABC approximations in 
general Bayesian model choice. The issue of sufficiency is covered in this paper, with a 
generic cross-model sufficiency completion leading the authors to validate the method in 
full generality, including in-sufficient cases. 

In this paper, we argue that ABC is a valid approximation method for conducting 
Bayesian inference in complex stochastic models, barring the limitation that it cannot 
discriminate between those complex stochastic models when based on summary statistics. 
In essence, we highlight the fact that, since ABC is conducting model choice based on 
in-sufficient statistics, the resulting inference is flawed in that the loss of information is 
severe to the point of inconsistency, namely that the ABC model selection cannot recover 
the proper model, even with an infinite amount of observation and computation. We 
demonstrate this inconsistency in the limiting (and more favourable) case of sufficient 
statistics. 

The conclusion of the current paper are thus quite negative in that we consider that 
conducting testing or model comparison using ABC does not carry any reliable weight of 
evidence and therefore should not be trusted. More empirical measures such as those pro- 



posed in Ratmann et al. (2009) and Drovandi et al. (2011) seem to be the only possibility 



at the current time for conducting model comparison. We are therefore at odds with the 
positive conclusion found in Didelot et al. ( 2010[ ), as discussed below. 

We stress here that, while Templeton (2008, 2010) repeatedly expressed reservations 



about the formal validity of the ABC approach in statistical testing, those criticisms were 
addressed at the Bayesian paradigm per se rather than at the approximation method. 
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http:/ /a bc-sysbio. sou rceforge.net 




http:/ /wwwl. montpellier. in ra.fr/ 


bsGP/diyabc 




Rttp:/ /code. google. com /p/popab| 
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Quite clearly, Templeton's criticisms got rebutted in Beaumont et al. (2010), Csillery 



eTaT] poTObl ), [Berger et al.| ( |2010| ) and are not relevant for the current paper. 



The plan of the paper is as follows: in Section [2| we recall the basics of ABC as well 
as its justification; Section |4] exposes why a Bayes factor based on an ABC approximation 
is not converging to the true Bayes factor as the computational effort increases; Section 
[5] explains the specificity of MRFs in this regard, while Section [6] illustrates the potential 
for divergence in examples. Sectoion [7] concludes the paper. 



2 The ABC approach and its justifications 



The setting in which ABC operates is the approximation of the simulation from the pos- 
terior distribution 7r(0|y) oc 7r(0)/(y|0) when both distributions associated with vr and / 



can be simulated. The first ABC algorithm was introduced by Pritchard et al. (1999) in 



a genetic setting, as follows: given a sample y from a sample space T), 



Algorithm 1 ABC sampler 
for i = 1 to iV do 
repeat 

Generate 6' from the prior distribution 7r(-) 
Generate z from the likelihood f{-\0') 
until p{7/(z),??(y)} < e 
set 0.1 = 6', 
end for 



The parameters of the ABC algorithm are the statistic r], the distance •} > 0, and 
the tolerance level e > 0. The approximation of the posterior distribution provided by the 
algorithm is that it samples from the marginal in 6 of the joint distribution 

7r,{9, z y) = -J. , (1) 

where Ib(-) denotes the indicator function of the set B and where 

A,^y = {zeV\p{7]{z),7]{y)} <e}. 

The basic justification of the ABC approximation is that, when using a sufficient statistic 
7] and a small (enough) tolerance e, we have 

7T,{e\y) = J 7re(0,z|y)dz«7r(0|y), 

the (correct) posterior distribution 7r(0|y) being the limit as e goes to zero of TTe{0\y). 
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In practice, the statistic rj is not sufficient and the approximation then converges to 
7r^(6\rj{y)). This fact is appreciated by users in the field who see this loss of information as 
an unvoidable price to pay for the access to computable quantities. While acknowledging 
the gain brought by ABC in handling Bayesian inference in complex models, we will 
demonstrate below that the loss due to the ABC approximation may be arbitrary in the 
specific setting of Bayesian model choice and testing, whether or not t] is sufficient. 



3 ABC and model choice 



Testing and model choice constitute a highly specific domain of Bayesian analysis that 
involves conceptual and computational complexification since several models are simulta- 



neously considered (Robert, 2001, Marin and Robert 2010). Given that both inferential 



problems are processed the same way in a Bayesian perspective, we will only mention 
model choice in the remainder of the paper, but the reader must bear in mind that we 
cover testing as a particular case. The standard tool on which a Bayesian approach relies 



is the evidence (Jeffreys, 1939), also called the marginal likelihood, 



[ 7r{e)f{y\e)de., 

Je 



that leads to the Bayes factor for comparing the evidences brought by the data on models 
with likelihoods /i(z|0i) and f2{z\62), 



Bi2{y) 



wijy) ^ /e,vri(0i)/i(y|0i)d0i 
W2iy) /e,vr2(02)/2(y|02)d02 



As detailed in the Bayesian literature (Berger, 1985| Robert| 2001, MacKay, 2002 Marin 
and Robert, 2010), this ratio provides an absolute criterion for model comparison that is 



naturally penalised for model complexity (Beaumont et al. , 2010, Berger et al. , 2010) and 



whose first order approximation is the Bayesian information criterion (BIC). 

Given that this issue is fundamental to our point, we recall that Bayesian model choice 
proceeds by creating a probability structure across models (or likelihoods). Namely, in 
addition to the parameters associated with each model, a Bayesian inference introduces 
the model index A4 as an extra parameter. It is associated with its own prior distribution, 
7r{A4 = m) {m = 1, . . . , M), while the prior distribution on the parameter is conditional 
on the value m of the model index, denoted by TTm{dm) and defined on the parameter 
space Qra- The choice between those models is then driven by the posterior distribution 
of 7W, 

. I . ^ AM = ra)wM 
^ '^^ Y.k<M = k)w,{y) 

where Wk{y) denotes the marginal likelihood of y for model k. 
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While this distribution is weU-defined and straightforward to interpret, it offers a 
challenging computational conundrum in Bayesian analysis. Moreover, the solutions found 



in the literature ( Chen et al. , 2000 , Marin and Robert , 2010 ) do not handle the case when 



the likelihood is not available and ABC represents the almost unique alternative. 



As exposed in e.g. Grelaud et al. (2009), Toni and Stumpf (2010), and Didelot et al. 



(2010), once Ai is incorporated within the parameters, the ABC approximation to the 
posterior follows from the same principles as regular ABC. The corresponding implemen- 
tation is as follows, using for the tolerance region a statistic r]{z) = (r/i(z), . . . , ?7a-/(z)) 
that is the concatenation of the summary statistics used for all models (with an obvious 
elimination of duplicates). 



Algorithm 2 ABC model choice sampler (ABC-MC) 
for i = 1 to do 
repeat 

Generate m from the prior Tr{M = m) 
Generate 6m from the prior TTm{dm) 
Generate z from the model /m(z|0m) 

until p{T]{z),ri{y)} < e 

Set m(*) = m and 0^^) = 0^ 
end for 



The ABC estimate of the posterior probability Tr{Ai 
acceptances from model m in the above simulation 



m|y) is then the frequency of 



— 1 ^ 



This also corresponds to the frequency of simulated pseudo-dataset from model m that 
are closer to the data y than the tolerance e. In order to improve the estimation by 



smoothing, Cornuet et al. (2008) follow the rationale that motivated the use of a local 



linear regression in Beaumont et al. ( 2002 ) and rely on a weighted polychotomous logistic 



regression to estimate 7r{A4 
software. 



m|y). This modelling is implemented in the DIYABC 



4 The difficulty with ABC-MC 

Most perspectives on ABC do not question the role of the ABC distance nor of the statistic 
77 in model choice settings. There is however a much stronger discrepancy between the 
genuine Bayes factor / posterior probability and the approximations resulting from ABC. 
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The ABC approximation to a Bayes factor, B12 say, resulting from Algorithm [2] is 



Tr{M = 2) EiIiImW=i 



An alternative representation is given by 

7r(A^ = 1) X^^iI^t=2lp{T,(z'),»7(y)}<e 

where the pairs (m*, z*) are simulated from the (joint) prior and T is the total number of 
simulations that are necessary for N acceptances in Algorithm [2j In order to study the 
limiting behaviour of this approximation, we first let T go to infinity. (For simplification 
purposes and without loss of generality, we choose a uniform prior on the model index.) 
The limit of B12 (y ) is then 



F[M = 2,p{riiz),r]iy)}<e] 

_ / JIp{r?(z),r7(y)}<€7ri (^Q/l (z|0i) dz d^i 
/Ip{r,(z),r,(y)}<e7r2(02)/2(z|02)dz d02 

_ JJtp{r,,r?(y)}<€7ri(0l)/f (T7|6'i)d?7 d^i 
/^P{r7,r7(y)}<e7r2(02)/2''('?l^2)d?7d6>2 ' 

where f^{ri\6i) and /^(?7|02) denote the distributions of ?7(z) when z ~ /i(z|0i) and 
z ~ /2(z|02)) respectively. By L'Hospital formula, if we let e go to zero, the above 
converges to 

''^^^ /vr2(02)/2^(r/(y)|02)d02' 

which is precisely and exactly the Bayes factor for testing model 1 versus model 2 based 
on the sole observation of Ti{y). This result is completely coherent with the current 
perspective on ABC, namely that the inference derived from the ideal ABC output when 
e = only uses the information contained in ?7(y). Thus, in the limiting case, i.e. when 
the ABC algorithm uses an infinite computing power, the ABC odds ratio does not take 
into account the features of the data besides the value of T7(y), which is why the limiting 
Bayes factor only depends on the distributions of r] under both models. 

In contrast with point estimation — where using a sufficient statistic has no impact on 
the inference in the limiting case — , the loss of information resulting from considering 
solely 77 seriously impacts the resulting inference on which model is best supported by 



the data. Indeed, as exhibited in a special case by Grelaud et al. (2009), the information 



contained in ri{y) is almost always smaller than the information contained in y and this 
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even in the case //(y) is a sufficient statistic for both models. In other words, ri{y) being 
sufficient for both /i(y|0i) and /2(y|02) does not usually imply that r]{y) is sufficient for 
{fn, fm{y\Gm)} ■ To see why this is the case, consider the most favourable case, namely 
when ?7(y) is a sufficient statistic for both models. We then have by the factorisation 
theorem (Lehmann and Casella, 1998) that fi{y\0i) = gi{y)fi'i'n{y)\0i)i therefore that 



^12 (y) 



wijy) 
W2{y) 

/e,vr(0i)gi(y)/f(T7(y)|0i)d0i 
/e,vr(02)52(y)/2^(^(y)|02)d02 

gi(y) J7r,{ei)f^{rjiy)\6,)de, 
92{y) /^2(02)/2^(r?(y)|02)d02 



9i[y) 
52 (y) 



(2) 



Therefore, unless gi{y) = 52 (y), the two Bayes factors differ by this ratio, 51 (y ) 752 (y ) , 
which is only equal to one in a very small number of known cases. This decomposition 
is a straightforward proof that a model-wise sufficient statistic is usually not sufficient 
across models, i.e. for model comparison. An immediate corollary is that the ABC-MC 
approximation does not converge to the exact Bayes factor. 

The discrepancy between the limiting ABC inference and the genuine Bayesian infer- 
ence does not completely come as a surprise, because ABC is indeed an approximation 
method. Users of ABC algorithms are therefore prepared for some degree of imprecision in 



their final answer, a point stressed by Wilkinson (2008) or Fearnhead and Prangle (2010) 



when they qualify ABC as exact inference on a wrong model. However, the magnitude of 
the difference between i?i2(y) and B^2{y) expressed by ^ is such that there is no direct 
connection between both answers. In a general setting, if rj has the same dimension as 
one component of the n components of y, the ratio 5i(y)/52(y) is equivalent to a density 
ratio for a sample of size 0(n), hence it can be arbitrarily small or arbitrarily large when 
n grows. On the opposite, the Bayes factor (y) is based on what is equivalent to a 
single observation, hence does not necessarily converge with n, as shown by the Poisson 
and normal examples below. The conclusion derived from one Bayes factor may therefore 
completely differ from the conclusion derived from other one and there is no possibility of 
a generic agreement between both, or even of a manageable correction factor. 

For this reason, we conclude that the ABC approach cannot be used for testing nor for 
model choice, with the exception of Gibbs random fields as explained in the next section. 
In all cases when gi{y)/g2{y) is different from one and impossible to approximate, no 
inference on the true Bayes factor can be made based on the ABC-MC approximation 
without further information on the ratio 5i(y)/s'2(y)) which is most often unavailable. 



We note that Didelot et al. (2010) also derived this relation between both Bayes factors 
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in their formula (18) but surprisingly concluded on advocating the use of ABC in complex 
models, where there are no sufficient statistics. We disagree with this perspective for 
reasons that will be made clear in the following sections. 

5 The special case of Gibbs random fields 

Grelaud et al.| ( |2009[ ) showed that, for Gibbs random fields and in particular for Potts 
models, when the goal is to compare several neighbourhood structures, the computation 
of the posterior probabilities of the models / structures under competition can be operated 
by likelihood- free simulation techniques, in the sense that there exists a converging ap- 
proximation to the true Bayes factor. The reason for this property is that, in the above 
ratio, 5'i(y) = 52 (y) in this special model. 

Indeed, if we consider a Gibbs random field given by the likelihood function 

/(y|^) = ^exp{0T^(y)}, 

where y is a vector of dimension n taking values over the finite set X (possibly a lattice), 
r]{-) is the potential function defining the random field, taking values in M^, 9 G MP is 
the associated parameter, and Zq is the corresponding normalising constant, the potential 
function r/ is a sufficient statistic for the model. For instance, in Potts models, the sufficient 
statistic is the number of neighbours, 

viy) = ^hy,=yA^ 

associated with a neighbourhood structure denoted by i ~ i' (meaning that i and i' are 
neighbours). 

The property that validates an ABC resolution for the comparison of Gibbs random 
fields is that, due to their specific structure, there exists a sufficient statistic vector that 
runs across models and which allows for an exact (when e = 0) simulation from the 
posterior probabilities of the models. More specifically, consider M Gibbs random fields 
in competition, each one being associated with a potential function r/^ (1 < m < M), 
i.e. with corresponding likelihood 

fm{y\Om) =exp{0j^r/„(y)}/Z0„,m, 

where 6m G ©m and ^0„,m is the unknown normalising constant. A Bayesian analysis 
operates on the extended parameter space = U^^j^jm} x that includes both the 
model index and the corresponding parameter space Qm- The inferential target is thus 
the model posterior probability 

P(A^=m|y)0C / fm{y\9m)T^ra{0m) <^OmT^{M = m) , 
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i.e. the marginal in Ai of the posterior distribution on {Ai,Oi, . . . ,9m) given y. Each 
model has its own sufficient statistic rjm{-)- Then, for each model, the vector of statistics 
vi') = (^i(')) • ■ ■ ) '?w(0) is clearly sufficient; furthermore Grelaud et al. (2009) exposed 
the fact that ij is also sufficient for the joint parameter {A4, 9i, . . . , 9m)- That this con- 
catenation of sufficient statistics is jointly sufficient across models is a property that is 
rather specific to Gibbs random field models, at least from a practical perspective (see 
below) . Figure [l] shows an experiment from Grelaud et al. ( 2009 ) concluding rightly at 
the agreement between the exact Bayes factor and an ABC approximation. 



25 
O 
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00 
d 
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0.8 
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Figure 1: Comparison between the true Bayes factor and the ABC approximation in a 



Markov model selection of Grelaud et al.| ( 2009 ) , based on 2, 000 simulated sequences and 
4 X 10^ proposals from the prior. The solid/red line is the diagonal. (Source: Grelaud 



Didelot et al. (2010) point out that this specific property of Gibbs random fields can 



be extended to any exponential family (hence to any setting enjoying sufficient statistics. 



see e.g. Casella and Berger, 2001). Their argument is an encompassing property: by 



including all sufficient statistics and all dominating measure statistics in an encompassing 
model, models under comparison become submodels of the encompassing model. They 
then conclude that the concatenation of those statistics is jointly sufficient across models. 
While this encompassing principle holds in full generality, in particular when comparing 
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models that are already embedded, we think it leads to a biased perspective about the 
merits of ABC for model choice: in practice, complex models do not enjoy sufficient 
statistics (if only because they are not exponential families). As demonstrated in the 
next section, there is more than a mere loss of information due to the use of insufficient 
statistics and looking at what happens in the limiting case when one is relying on a 
common sufficient statistic is a formal study that brings light on the potentially huge 
discrepancy between the ABC-based Bayes factor and the true Bayes factor. To study a 
solution to the problem in the formal case of the exponential families does not help in the 
understanding of the discrepancy in non-exponential models. 



6 Arbitrary ratios 

The difficulty with the arbitrary discrepancy between i?i2(y) and i?^2(y) is that it is 
impossible to evaluate in a general setting, while there is no reason to expect a reasonable 



agreement between both quantities. A first illustration was produced by Marin et al 



(2011) in the setting of MA{q) time series: a simulation experiment showed that, when 
comparing an AIA{2) with an MA{1) model, the ABC approximation to the Bayes factor 
was stable (around 2.3) as e decreases, remaining far from the true Bayes factor 17.7 for 
an MA(2) simulated sample, while the approximation was 0.25 against a true value of 
0.004 in the case of a simulated MA(1) sample. 

6.1 A Poisson-negative binomial illustration 

As a first illustration of the discrepancy due to the use of a sufficient statistic, consider 
the simple case when a sample y = {yi, . . . ,yn) could come from either a Poisson V{X) 



distribution or from a geometric G{p) distribution, already introduced in Grelaud et al. 



( 2009 ) as a counter-example to Gibbs random fields and later reprocessed in Didelot et al. 

(2010) to support their sufficiency argument. In this setting, the sum S = yi = viy) 
is a sufficient statistic for both models but not across models. The distribution of the 
sample given S" is a multinomial A4{S, 1/n, . . . , 1/n) distribution when the data is Poisson, 
since S is then a Poisson V{nX) variable, while it is the uniform distribution with constant 
probability 

1 , _ Sl{n-l)l 

^n+s~Y^^y^=3 (n + S-l)!^'^'=^ 

in the geometric case, since S is then a negative binomial Meg{n,p) variable. The dis- 
crepancy ratio is therefore 

51 (y) _ S\n-S/U^y^^■ 



92{y) 



I ■ 

s 
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When simulating n Poisson or geometric variables and using prior distributions 



A -£:(!), p^U{0,l), 

on the respective models, the exact Bayes factor can be evaluated and the range and 
distribution of the discrepancy are therefore available. Figure [2] gives the range of i?i2(y) 
versus i?^2(y)) showing that B^gly) is in this case absolutely un-related with i3i2(y): The 
values produced by both approaches simply have nothing in common. As noted above, 
the approximation S^glx) based on the sufficient statistic S is producing figures of the 
magnitude of a single observation, while the true Bayes factor is of the order of the sample 
size. 




— I — I — I — I — i — i — \ — 1 1 n 

5 10 IS 20 2S 30 3S -ISO -100 -SO 

Figure 2: Comparison between the true log-Bayes factor (first axis) for the comparison of 
a Poisson model versus a negative binomial model and of the log-Bayes factor based on 
the sufficient statistic Y^^Vi (second axis), for Poisson (left) and negative binomial (left) 
samples of size n = 50, based on T = 10^ replications. 

The discrepancy between both Bayes factors is in fact increasing with the sample size, 
as shown by the following result: 

Lemma 1. Consider performing model selection between model 1: V{\) with prior dis- 
tribution 7ri(A) equal to an <?(1) distribution and model 2: Q{p) with a uniform prior 
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distribution n2 when the observed data y consists of iid observations with = > 0. 
Then S{y) = Y17=i Hi '^^ minimal sufficient statistic for both models and the Bayes 
factor based on the sufficient statistic S{y), i?^2(y)' satisfies 

lim B^y) = ^-^^^e-'" a.s. 

Therefore, the Bayes factor based on the sufficient statistic S{y) is not consistent; it 
converges to a non-zero, finite value almost surely. 

Proof. Under model 1, we have S ~ V{nX), with corresponding likelihood 
The marginal likelihood of S under the prior vri is then 



oo 



e^^dA = - / — — ^dX 



T{S + l)n-S S Jo T{S)n-S 

1 1/ l\-s 



Sin + l)S - S\^^ n) ■ 

Under model 2, the sufficient statistic has a negative binomial distribution, S ~ Neg(n,p) 
and thus 

^ + ^ -A^Sn _ r(5' + n) g 



The corresponding marginal likelihood under the prior tt2 is 
r{S + n) 



T{S + 



n 



(4) 



(5 + n + l)(5 + n) ' 

Therefore from ([s]) and Q, the Bayes factor based on the sufficient statistic is given by 

Since the j/j's are iid with mean 9q, the Law of Large Numbers implies that S/n — )■ 6q 
almost surely, thus 

{S + n){S + n + l) {00 + 1)^ 



lim 



S n 9q 
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since > 0. Furthermore, 



lim 1 + - 



lim e-^l°g(i+i/") 



Thus from we deduce that 

hm -B^2(y) = 

proving the result. 



9o + l) 



□ 



In this specific setting, Didelot et al. (2010) show that adding -P = Hi t° suffi- 



cient statistic S induces a statistic (5, P) that is sufficient across both models. While this 
is a mathematically correct observation, we think it is not helpful for the understanding 
of the behaviour of ABC-model choice in realistic settings: outside toy examples as the 
one above and well-structured although complex exponential families like Gibbs random 
fields, it is not possible to come up with completion mechanisms that ensure sufficiency 
across models and it is therefore more fruitful to consider the diverging behaviour of the 
ABC approximation as given, rather than attempting at solving the problem. 



6.2 A normal illustration 

First, note that, given a one-dimensional sufficient statistic S = Ti{y), the functions gi{y) 
and (72 (y) can on principle be anything. For instance, 

n 

51 (y) = H'^^^/i - S\al)Ij2^y^=^s 

i=l 

and 

n 

52 (y) = n^(y^ ~ S\a^)^T.,yi=nS 

i=l 

is a possible model. In other words, by a reparameterisation of the models, we could 
observe y = (yi, . . . , S) with 

yi, . . . ,y„„i|S' ~ 7V(5, fJi) and yi, . . . , ~ AA(5, erf) , 

this independently of the distributions of S under both models. (This means that we 
can find two competing models where the distributions of S are not connected with fji 
nor with £72.) Because they depend on the choice of those distributions, the true Bayes 
factor and the ABC-Bayes factor are unrelated and may as well diverge from one another. 
Admitedly, this construct is artificial in that there is no clear statistical setting when this 
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could occur, but the construct is both mathematically valid and informative about the 
lack of control over the diverging factor 5i(y)/5'2(y)- 
If we look at a fully normal N{ijl, a"^) setting, we have 

/(y|/x) oc exp | -na-\y - ixf /2 - - yf /2 I a'^ 



hence 



Vi=ny ■ 



f{y\y) oc exp ^(y, - yf ^-1^,, 

If we reparameterise the observations into u = (yi — y, . . . , yn-i — VtV)-, we do get 
/(u|/x) oc a-"exp{-na-2(y-/x)V2} 



X exp < 

since the Jacobian is 1. Hence 



n— 1 



i=l 



n— 1 

.1=1 



/2 



n— 1 



/(u|y) oc exp <^ -a ^ ^ n,V2 - a" 



1=1 



n-l 
.i=l 



/2\a- 



Considering both models 

yi,---,yn ~ A/"(/x,(7i) and yi,...,yn'^ M{n,al). 
the discrepancy ratio is then given by 



guy J 
52 (y) 



exp \ -CJ^^ ^-^\yi - yfl2 - at' T^UiVi - v) l^) 



n-l, 



-n+1 



n-l, 



-n+1 



a. 



n-l 



n-l 



exp < 



(Jo —a 



-2 I n-l 



i=l 



n-l 



^ivi - y) 



and is connected with the lack of consistency of the Bayes factor: 

Lemma 2. Consider performing model selection between model 1: Af{iJ,,(Ji) and model 2: 
A/"(jU, o"!)) ci o,nd o"2 being given, with prior distributions 'Ki{p) = vr2(/i) equal to aj\f{0, a^) 
distribution and when the observed data y consists of iid observations with finite mean and 
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variance. Then S{'y) = Y17=i Vi minimal sufficient statistic for both models and the 

Bayes factor based on the sufficient statistic S{y), i?^2(y); satisfies 

lim BVJy) = 1 a.s. 
Proof. The marginal likelihood associated with S{y) and the prior fj, ~ A/'(0, a^) is 



hence leading to the Bayes factor 





exp < 




1 




I 2(a2 


+ ^IM J 




exp < 




1 




I 2(a2 


+ ^i/n) J 



n(j2 ^ + a 2 
ncjf ^ + a~2 



which indeeds converges to 1 as n goes to infinity. □ 

Figure [3] illustrates the behaviour of the discrepancy ratio when ai = 0.1 and (T2 = 10, 
for datasets of size n = 15 simulated according to both models. The discrepancy (expressed 
on a log scale) is once again dramatic, in concordance with the above lemma. 

If we now turn to an alternative choice of sufficient statistic, using the pair {y, 52) with 

n 



we follow the solution of Didelot et al. (2010). Using a conjugate prior ^ ~ AA(0, a^), the 
true Bayes factor is given by 



^ exp{-5V2a2} exp{-^2/2(Q2 ^ ^2/^)1 ^ a-^ + a^^n 
exp{-52/2a2} exp{-y2/2(a2 + al/n)} ^^-2 + ^-2^ ' 

and it is equal to the Bayes factor based on the corresponding distributions of the pair 
(y, S'^) in the respective models. Again, we do not think this coincidence brings the proper 
light on the behaviour of the ABC approximations in realistic settings. 
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Figure 3: Empirical distributions of the log discrepancy log gi (y ) / g2 (y ) for datasets of size 
n = 15 simulated from AA(/i, o"^) (left) and AA(/i, al) (right) distributions when ai = 0.1 
and (T2 = 10, based on 10^ replications and a flat prior. 
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7 Conclusion 



Since its introduction by Tavare et al. (1997) and Pritchard et al. (1999), ABC has been 
extensively used in several areas involving complex likelihoods, primarily in population 
genetics. In those domains, ABC has been used both for point estimation and testing of 
hypotheses. In realistic settings, with the exception of Gibbs random fields that satisfy 
a resilience property with respect to their sufficient statistics, the conclusions drawn on 
model comparison cannot alas be trusted per se but require further analyses as to the 
pertinence of the (ABC) Bayes factor based on the summary statistics. This paper has 
only examined in details the case when the summary statistics are sufficient for both 
models, while practical situations imply the use of in-sufficient statistics, and further re- 
search is needed for the latter case. However, this practical situation implies a wider loss 
of information compared with the exact inferential approach, hence a wider discrepancy 
between the exact Bayes factor and the quantity produced by an ABC approximation. 
It thus appears to us an urgent duty to warn the community about the dangers of this 
approximation, especially when considering the rapidly increasing number of applications 
using ABC for conducting model choice and hypothesis testing. As a final (and negative) 
point, we unfortunately do not see an immediate and generic alternative for the approx- 
imation of Bayes factors because importance sampling techniques are suffering from the 
same difficulty, namely they only depend on the summary statistics. 

As a final remark, we note that Sousa et al. (2009) advocate the use of full allelic 
distributions in an ABC framework, instead of resorting to summary statistics. They show 
that it is possible to apply ABC using allele frequencies to draw inferences in cases where 
it is difficult to select a set of suitable summary statistics (and when the complexity of 
the model or the size of dataset makes it computationally prohibitive to use full-likelihood 
methods). In such settings, were we to consider a model choice problem, the divergence 
exhibited in the current paper would not occur because the measure of distance does not 
rely on a reduction of the sample. 
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